Chromosome‐level genome assembly of the black widow spider Latrodectus elegans illuminates composition and evolution of venom and silk proteins

Abstract Background The black widow spider has both extraordinarily neurotoxic venom and three-dimensional cobwebs composed of diverse types of silk. However, a high-quality reference genome for the black widow spider was still unavailable, which hindered deep understanding and application of the valuable biomass. Findings We assembled the Latrodectus elegans genome, including a genome size of 1.57 Gb with contig N50 of 4.34 Mb and scaffold N50 of 114.31 Mb. Hi-C scaffolding assigned 98.08% of the genome to 14 pseudo-chromosomes, and with BUSCO, completeness analysis revealed that 98.4% of the core eukaryotic genes were completely present in this genome. Annotation of this genome identified that repetitive sequences account for 506.09 Mb (32.30%) and 20,167 protein-coding genes, and specifically, we identified 55 toxin genes and 26 spidroins and provide preliminary analysis of their composition and evolution. Conclusions We present the first chromosome-level genome assembly of a black widow spider and provide substantial toxin and spidroin gene resources. These high-qualified genomic data add valuable resources from a representative spider group and contribute to deep exploration of spider genome evolution, especially in terms of the important issues on the diversification of venom and web-weaving pattern. The sequence data are also firsthand templates for further application of the spider biomass.


Hui Xiang
Order of Authors Secondary Information: Response to Reviewers: Dear Editor, Thank you and the three reviewers for your time and constructive comments on our manuscript " Chromosome-level genome assembly of the black widow spider Latrodectus elegans illuminates composition and evolution of venom and silk proteins" (GIGA-D-21-00338). These comments will greatly help improving our manuscript. Now we carefully revised our manuscript according to the reviewers' comment and polish the grammar in our revised manuscript. We wish that the updated manuscript could fit the reviewers' requirement. Meantime, we made the point-by-point responses to each of the comment from the editor and the reviewers, as follows: (We also attached a response letter file in our submission materal) Editor: Please also take a moment to check our website at https://www.editorialmanager.com/giga/ for any additional comments that were saved as attachments. R: we checked the website and found no additional comments. Thank you for your kindly reminding. In addition, please register any new software application in the bio.tools and SciCrunch.org databases to receive RRID (Research Resource Identification Initiative ID) and biotoolsID identifiers, and include these in your manuscript. This will facilitate tracking, reproducibility and re-use of your tool. R: Thank you for your reminding. We checked all the softwares used in our manuscript and confirmed that there was no any new software used. Meantime, the electronic notebook-style methods document of these softwares were upload to the GigaDB dataset. We affirm that our tools are easy to track and reproducible. Please ensure you describe additional experiments that were carried out and include a detailed rebuttal of any criticisms or requested revisions that you disagreed with. Please also ensure that your revised manuscript conforms to the journal style, which can be found in the Instructions for Authors on the journal homepage. If the data and code has been modified in the revision process please be sure to update the public versions of this too. R: We have ensured all the above items. Besides, we suggest you find a copy editing company or friendly native English speaker to polish the grammar. R: We have made our manuscript polished by a copy editing company (Letpub).
Reviewer reports: Reviewer #1: The manuscript described the genome of an orb-weaver spider Latrodectus elegans with a combination of Nanopore, Illumina and Hi-C sequencing and the comparison of latrotoxin genes and spidroin genes among spiders. The authors reported the assembly, quality and analysis of the genome in details and provided enormous information, in my opinion, useful to the research community. However, I suggest a major revision before the work is published. The main concern is on the toxin identification. R: Thank you very much for your overall positive comments to our manuscript and important suggestion. I have one major concern related to the analysis of venom toxin genes. In Fig. 3 and related text, no ICK family gene was identified in five species including spider, scorpion and tick where ICK toxins have been reported and even functionally studied. In the other five spider species, fewer ICK genes were found (1 to 7). ICK family was considered an abundant toxin family in Arachnida and their homology is mainly dependent on the disulfide-bond instead of the primary sequence. The toxin genes were identified via blastp, which heavily relies on the sequence similarity. Similar concerns may apply to the other toxin families. I would suggest the authors redo the toxin gene analysis. R: Thanks for your important comments and we are sorry for our carelessness in spider toxin identification. We investigated more on these toxins and learned that, as the reviewer stated, ICK toxins are a kind of small peptide and their homology is mainly dependent on the disulfide-bond instead of the primary sequence, unlike those proteins from CRISP family or Latrotoxin family. We then re-identified the ICK toxins through the I think the work is very interesting. This new genome provides new data that will be very useful to advance our understanding of the diversification of spiders and the evolution of some of the most relevant aspects of their biology. Nevertheless, I have some important concerns about different aspects of the ms, which in my opinion must be properly addressed before considering the manuscript suitable for publication in Gigascience. R: Thank you very much for your interest of our manuscript and the instructive comments.
First of all, I think it is necessary to improve the language. I am not a native English speaker, but I think there are many sentences that are not well punctuated and some expressions are not well constructed. I highly recommend a thorough review of the entire text. R: We are very sorry for our poor writing. We made a carefully revision and modification in writing. In addition, we had let a copy editing company polish our manuscript, in order to further improve it. Apart from the language itself, there are poorly constructed sentences from the point of view of biological concepts. Phrases such as "Systematic identification of complete toxin gene sequence", in line 68, or "..will aid the identification and understanding of the complete spider toxin and spider silk genes..line 76", or "..to detect the genome integrity..line 117", or "transposon elements..line 121"..or "connected into one super gene..line 169", or "all merged values were plotted and grouped by species...line 210", or "the chromosome distribution of spidroin were plotted...line 229", or "..genes that had homolog genes in the public database..line 264", or "the original group of arachnida..line 327"..etc.. R: Line 68: we modified it to: "Systematic identification and analysis of all the spider toxin genes with full length sequences". Line 76: we modified it to: "…will provide important resources for deciphering the spider toxin and spidroin genes". Line 117: we modified it to: "..to evaluate the genome integrity." Line 121: we modified it to: "transposable elements". Line 169: we modified the sentence to: "All the well-aligned single-copy orthologous genes in each species were concatenated to one super gene for each species." Line 210: we modified it to: "all the Ka/Ks values were plotted and grouped by species. " We also made other modifications in the context. Line 229: we modified it to: "The chromosome distribution of the spidroins were then plotted ...". Line 264: we modified it to: "... genes having homologous hits in the public databases." Line 327: we modified it to "... It is the only species from the relatively ancient Mygalomorphae. The result suggests that relatively ancient spider group may have relatively less selection pressure in their habitats. However, due to its low genome quality, this result is pending further verification....". In addition, we carefully checked the whole manuscript and made intensive revisions of it.
Regarding specific contents, there are some things that need to be revised, rewritten or expanded. Here are some examples: In line 137 a sentence begins with "the protein sequences of ...." and then, after the comma, continues with "... the longest transcript were chosen for further analyzes. Something is wrong here ... R: We modified the statement to: "We chose the longest isoform of each gene to obtain the non-redundant protein sequences of each gene set." We also made modifications in the context. In line 143, the phrase "All the remaining genes were aligned using blastn" is not understood, what do the authors mean? R: We modified the related statements to: "We chose the longest isoform of each gene to obtain the non-redundant protein sequences of each gene set. The protein sequences were used as query to search for orthologous regions in the L. elegans genome by tblastn with an e-value of 1e-5. The results were subjected to GeneWise software (v2.4.1) (Birney and Durbin 2000) to predict gene structure." Various lines. The names and identifiers of the species used in the different analyzes are unnecessarily repeated in several paragraphs. R: Thanks for reminding. We carefully checked the manuscript and removed the repeated names and identifiers of the species. line 150. Please, replace "homologous searching" with "sequence similarity-based searches" or similar... R: Thanks. We replaced "homologous searching" with" homology-based searches". Line 183. "Using ..... gene pairwise relationships". What do the authors mean by this phrase? What pairwise relationships do they use to determine the evolution of families in CAFE? R: We are sorry for this vague description. However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including analysis on gene family expansion and contraction. So this sentence was removed. line 190. "To identify the potential positively selected genes (PSGs) and the genes with positively selected sites ..." I do not understand the difference between these two classes of genes. R: Sorry that we didn't describe it clear. We used different models in Codeml tool of PAML package (v4.8) to detect potential positively selected genes (PSGs) and the genes with positively selected sites. PSGs were those identified by two ratio branch model. Those not identified as PSGs, were then subjected to branch site model to detect whether they have positive selected sites. We re-wrote the method of positive selection analysis: "Single-copy orthologs from the five relatively closely related species of orb-web weaving spiders, i.e., L. elegans, P. tepidariorum, A. bruennichi, T. clavipes and T. antipodiana. As well as their species tree were used to identify the potential positively selected genes (PSGs) and the genes with positively selected sites in L. elegans, using branch model and branch site model in the Codeml tool of the PAML package (v4.8) (Yang 1997) respectively. Briefly, the rate ratio (ω) of nonsynonymous to synonymous nucleotide substitutions was estimated. One ratio model was used to detect average ω across the species tree (ω0). For each gene, two ratio branch model was used to detect the ω of the appointed branch (L. elegans) to test the (ω1) and ω of all other branches (ω_background). A likelihood ratio test was performed to compare the fit of the two ratio models with the one ratio model to determine whether the gene is positively selected in the appointed branch (ω1>ω0; ω1>ω_background; ω1>1; P-value<0.05). If the gene is not positively selected, then the branch site model was used to detect amino acid sites likely to be is positively selected in the appointed branch using Bayes Empirical Bayes (BEB) analysis (Yang et al., 2005 Line 132: We are sorry for this mistake. Yes, r8s is an individual program. Given that we re-wrote the description of TE analysis (The insertion time of each TE sequence was estimated using a Kimura distance based analysis (Chalopin, Naville et al. 2015) using parseRM (https://github.com/ 4ureliek/Parsing-RepeatMasker-Outputs)), this sentence was removed.
We added updated and revised information to the revised manuscript.
Other concerns: The paragraphs titled Relative evolution calculation and Gene family expansion and contraction in the methods section are poorly written, it is hard to understand what they are doing. R: Sorry for the poor descriptions. However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including analysis on gene family expansion and contraction. So this sentence was removed. Lines 183-184. "to identify gene families significantly expanded or contracted" R: Thanks. However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including analysis on gene family expansion and contraction. So this sentence was removed.
In the paragraph titled Positive Selection Analysis in the methods section, please specify what specific evolutionary models have been used. R: Sorry for the vagueness. We re-wrote these contents and clearly described the models used.
Why are different compilations of RAxML (AVX or PHREADS) used in different parts of the job? R: In our manuscript, two different compilations were used because they were run on different clusters, which were "raxmlHPC-PTHREADS-AVX" and "raxmlHPC-PTHREADS". These are in fact of no difference in performance for both are PTHREADS version, and the AVX complication means the processors support AVX vector instructions which is faster in operation. To satisfy the needs of different part of the work, we changed our processors and the corresponding raxml complications.
Were the 14 scaffolds obtained in the assembly expected? R: Yes, they were. The female black widow spiders are reported to have 28 chromosomes. Is there information on karyotypes of this species? R: Although there is no karyotype of this species, there is evidence that female black widow spiders have 28 chromosomes. Thanks for your reminding and we added the reference (Zhao YN, Ayoub A, Hayashi CY. Chromosome mapping of dragline silk genes in the genomes of widow spiders (Araneae, Theridiidae). PLoS One 2010; 5(9): e12804.), and the related statement to the revised manuscript. What software has been used to make the graphs of the syntenic relationships between assemblies? That information is not in methods. R: Sorry for this problem, softwares Last (v1066) and Circos (v0.69-6) were used. To perform the whole genome synteny analysis, we implemented the last software (v1066) to perform the whole genome alignment using L. elegans assembly as database, in which the "lastal" command was first used to obtain MAF format alignment files and the "maf-swap" command was then used to sort the alignment and select the best one-to-one blocks. After that, Circos (v0.   To what extent is the phylogeny obtained in this work different from those already known? What is the contribution of this part of the work to the knowledge of the diversification of these species? R: The phylogeny obtained in this work is consistent to previously reported pattern. We added information in the species in our study to spider tree of life, especially estimating the divergent time between it and the relative Theridiidae spider, P. tepidariorum. In the paragraph titled "Gene family expansion and contraction of L. elegans" of results and discussion (see comment on lines 183-184 for a more correct title), the p-value <0.05, is a value corrected by the multiple testing the many families, or does it mean that the p value of each family is <0.05? Consider this problem if not. In fact, much information is lacking on how the turnover rate analysis has been done in this study. Please, expand the method section where this aspect is discussed. R: Thanks for your comments. In order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including analysis on gene family expansion and contraction, and highlighted analyses on toxin and spidroin genes. Lines 304-307. Please explain the reason for this statement or provide references. R: Thanks for your comments. In order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including analysis on gene family expansion and contraction. So this content was removed. Line 311. A. geniculata is not an ancient spider! Perhaps the authors mean that it belongs to an ancient lineage. R: We are very sorry for this mistake. Yes, we could not say A. geniculata is an ancient spider. This species belongs to Theraphosidae of the ancient Mygalomorphae, a relatively ancient lineage of Araneae. However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including general analysis on homologs. So this sentence was removed.
Lines 310-311. "This means that most of the genes have orthologous genes ..." How does this phrase match the fact that the phylogeny has only been done with 156 single-copy genes? Are all the other families with other orthology relationships? R: In the area of evolutionary genomics, homology genes are clustered into two parts, including ortholog and paralog. Especially, the former one, ortholog genes, are clustered mainly into single copy and multi copy parts. In the analysis of OrthoMCL, a classical gene family clustering software, five types are plotted, including single copy orthologs, multiple copy orthologs, other orthologs, unique paralogs, and unclustered genes. Considering the genome quality of A. geniculate, it is not figured into this description. That means most of the genes have orthologous genes, but only 156 genes are clustered into single-copy orthologs. However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including general analysis on homologs. So this sentence was removed. Line 320-321. Could it be the result of differences in the quality of annotations and assemblies rather than unique characters of this species? R: Obviously, we could not exclude the possibilities. These could be caused by a lot of reasons, like sequencing error, assembly error, and even the computational errors of the servers or the clusters. However, based on the big data analysis, and the rapid development of sequencing and software technology, especially, our fully comparative genomics and evolutionary genomics analysis, we have sufficient confidence in the quality of our assembly and annotation.
However, in order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including general analysis on homologs. So this sentence was removed. Line 327. ".this original group of arachnida ..", what does original mean? please review. R: Sorry for the mistake. We revised the statements to: "Interestingly, A. geniculata had the slowest evolution rate. It is the only species from the relatively ancient Mygalomorphae. The result suggests that relatively ancient spider group may have relatively less selection pressure in their habitats". Line 332. "...eight positively selected genes and 384 genes with positively selected sites ..". In both cases genes are positively selected. Please better explain the difference and specify which models have been used. R: Sorry for the vagueness description. We re-wrote the method and clearly indicate how to distinguish positively selected genes (PSGs) and genes with positively selected sites. We added the information on model use in PAML therein. line 343. "..we identified four major spider venom family proteins ..". This sentence is badly constructed. Perhaps the authors meant ... "We identified proteins / copies / members of (the) four major venom gene / protein families" ... R: Sorry for the poor writing. We revised to: "We identified members of the four major venom gene families". line 348. Please, indicate the p-value of these analyzes. R: Thanks. However, considering to another review's comments, we need to limited unnecessary genomic analysis. We hence removed analysis on gene family expansion and contraction. Accordingly, we removed this sentence. line 360. "We analyzed the nucleotide substitutions (Ks) .... and found that L. elegans laxotroxin genes had higher (Ka / Ks)". I don't understand the first (Ks). R: We are sorry for this misleading typo. It is the Ka/Ks rather than Ks. We revised the sentence to: "We analyzed the ratio of nonsynonymous to synonymous nucleotide substitution rate (Ka /Ks ratio) of each laxotroxin gene pair and found that L. elegans laxotroxin genes had generally higher Ka/Ks ratios compared to those in other species,…" . line 371. "relatively more ..." is not a particularly scientific expression. R: We are sorry for this unprofessional expression. We removed "relatively". Conclusions: I believe that many of the conclusions of this ms are not justified or are not supported by the results. They are more the result of speculation and interpretation. They should focus more on the description of the genome and its characteristics. The last sentences should move to the introduction. R: Sorry for our poor writing. We updated the conclusion: "Using the approach combining data of Illumina short reads, Nanopore long reads and Hi-C, we assembled and annotated the first chromosome-level 1.57 Gb large genome of a black widow spider, L. elegans. In this study, we confirmed phylogenetic position of this species in the spider tree of life. In addition, by analysis on hox gene family, we again verified it high quality. Specifically, we focused on toxin and spidroin genes, which contribute to the distinctive features of black widow and cob web-weaving spiders and provided substantial information in terms of their composition and numbers and preliminarily demonstrate the evolution pattern of one important toxin gene family, latrotoxins. The important venom toxins contribute greatly to black widow spider's toxicity and they showed fast evolution. Generally, the genome resource data will help for deep exploration of spider genome evolution, specially on diversification of veom and web-forming. The sequence data are also first-hand templates for further application of the spider biomass." Reviewer #3: Chromosome-level genome assembly of the black widow spider Latrodectus elegans illuminates composition and evolution of venom and silk proteins This manuscript described the sequencing and assembly of a black widow spider, and a number of analyses comparing the obtained reference genome sequence to other spider genomes. I realize that the manuscript is submitted as 'Data note', but a number of genome analyses are performed, and it is not very clear to me what the overall questions are that the authors want to address. R：Thanks for your reminding. The species used in our study is a kind of black widow spider, with well-known of its venom. It is also featured with its distinctive web, which is three-dimensional and called cob-web. We hence provided more information on the venom toxins and spidroins of this species, in order to highlight the importance of the genome resource. We noticed that a recent data note published note on Gigascience showed similar kind of information (such as: Sheffer MM, Hoppe A, Krehenwinkel H, Uhl G, Kuss AW, Jensen L, Jensen C, Gillespie RG, Hoff KJ and Prost S. Chromosome-level reference genome of the European wasp spider Argiope bruennichi: a resource for studies on range expansion and evolutionary adaptation. Gigascience 2021;10: 1-12.). According to your comment, we limited unnecessary genomic analysis. For instances, we removed analysis on gene family expansion and contraction, as well as general analysis on homologs. We also made modifications of our manuscript and made it meet the format requirement of "Date Note". The conclusion in the abstract is not a conclusion; just says 'this data is valuable', but not how/why. R: We modified the whole content of the abstract. Specifically, the revised conclusion therein was: "We present the first chromosome-level genome assembly of a black widow spider and provided substantial toxin and spidroins gene resources. These highqualified genomic data add valuable resources from a distinctive spider group and contribute to deep exploration on spider genome evolution, especially in terms of the important issues on diversification of venom and web-weaving pattern. The sequence data are also first-hand templates for further application of the spider biomass." Line 47: what does it mean 'due to their adaptability'? It sounds like they special with regard to adaptation. Is this so? R: Sorry for the misleading statement. Mechanism underlying wide distribution of spiders is a big issue and could not simply be attributed to adaptability and diverse behavior. We cautiously removed this statement. Line 73: the authors say 'full length spidroin sequences have been found' but no references. Here are some options: -https://www.nature.com/articles/ncomms4765 -https://www.nature.com/articles/ng.3852 -https://academic.oup.com/gigascience/article/10/1/giaa148/6067174?login=true R: We had cited two references which were shown in the end of the sentence. Thanks for providing the references. Since "https://www.nature.com/articles/ncomms4765": the spidroin sequences in this paper is not full-length, we added the other two papers to our revised manuscript. Line 309: The A. geniculata genome is of much lower quality than the other genomes. Therefore, the conclusions given in this section should be removed. This should also apply to line 326. R: Thanks for your reminding. In order to limiting genomic analysis and make the manuscript meet the requirement of "Date note" in Gigascience, we removed unnecessary genomic analysis, including general analysis on homologs. So this content was removed. I think that you should cite the papers publishing all the spider genome sequences (plus I. scapularis and C. sculpturatus) that you download and analyze.
Generally, I think that the manuscript uses too few citations. Another example is line 256 where the authors say that the results of their hox gene analysis are similar to others species, but without any references. Actually, I do not think there is a single reference in the result/discussion section, but may have missed it/them!!! R: Thanks for your reminding. We made intensive modifications in the manuscript and added quite a few new references to support our results and discussion. Line 256, we added the references: Schwager

Background 46
Spiders is a highly diverse and abundant group of predatory arthropods and more than 49,000 spider species have been described to date [1, 2]. They are found in a wide range of habitats 48 such as underground caves, tropical rainforests, deserts, and on glaciers [3][4][5]. Spiders are of 49 special interest due to their distinctive characteristics such as spider silk and venom. Spider silk 50 has unique mechanical properties and can potentially be used by the military industry, in 51 medicine and in other fields [6][7][8]. Spider venom has a complex composition and is rich in many 52 biologically active substances. This makes it valuable for possible applications in 53 pharmacological tools, reagents, drug precursors, biological pesticides and other biologically 54 active substances [9,10]. In-depth studies of the biochemical and physical properties of spider 55 silk and venom may require the identification of the complete sequence of spider genes, but the 56 lack of high-quality genomic data hinders these studies. 57 Latrodectus spp. are known as black widow spiders. They are featured and of great interest 58 for their extraordinarily neurotoxic venom [11,12]. Spider venom is a complex mixture of 59 toxins with different biological activities, from small molecular weight compounds to protein 60 and peptide substances. More than 100 different chemical components have been identified in 61 spider venom [13,14]. Compared with most other venomous animals, black widow spiders not 62 only contain toxins in the venom glands but also in their entire body, including their legs and 63 abdomen. Toxins are also found in spider eggs and newborn offspring. This unique feature 64 makes black widow spider venom components more diverse [15][16][17]. However, studies on black 65 widow toxins are relatively fragmented, and information on the components of black widow 66 venom still remains limited. Systematic identification and analysis of all the spider toxin genes 67 with full length sequences is now a top priority [18]. In addition, the black widow spider is 68 distinctive from those spiders that construct classic two-dimensional aerial capture webs. The 69 spider webs of black widow spiders are three-dimensional and are called cob-webs [19][20][21]. 70 Therefore, genetic deciphering of black widow spider silk provides data and clues for the 71 diversification of spiders. The full lengths of some major silk proteins (spidroins) have been 72 identified, but the systematic analysis on spidrions of cob-weaving spider is still lacking [22-73 25]. long sequencing reads and assembled to the chromosome level [2,25]. High-quality 78 chromosome-level spider genome resources remain scarce. In this study, we combined Oxford 79 Nanopore technologies and high-throughput chromosome conformation capture sequencing 80 [26,27] to generate a high-quality chromosome-level reference genome for Latrodectus 81 elegans (NCBI: txid2857379) and systematically analyzed venom proteins and spidrons. These

Quality control and genome characteristics evaluation 101
For the Nanopore long reads, reads with mean quality > 7 were retained using an in-house Perl 102 script for further assembly. For the Illumina short reads, the duplicated reads and the adaptors 103 were removed. Reads with more than 10% unknown bases or read pairs with more than 30% 104 low-quality bases were also excluded. Each read was removed 5 bp at both head and tail. 105 To investigate the genome characteristics of L. elegans, all the filtered short-insert reads 106 were used for k-mer analysis. The genome size was estimated by using the formula: G = 107 Knumber/Kdepth, where Knumber and Kdepth represent the total number and peak depth of 21-mer, 108 respectively. The genome size was calculated by GenomeScope (GenomeScope, 109 RRID:SCR_017014) v2.0 [28], with the k value set as 21 and other parameters set as default.

Genome assembly and evaluation 111
To obtain a high-quality genome, all of the filtered Nanopore long reads were assembled into 112 contigs using Nextdenovo software (v2.4) [29] with core parameters: -d 40 -g 1.74 g. The 113 single-base errors in the genome assembly were corrected using all the filtered Illumina short 114 reads by NextPolish (v1.3.1) with parameters: rerun = 3, -max_depth = 100. The Hi-C 115 sequencing reads were mapped to the polished contig assembly to anchor the contigs into 116 chromosomes using the 3D de novo assembly software (v170123) [30]. 117 To evaluate the quality and accuracy of the assembled genome, the following three 118 strategies were used. First, the quality of the assembled genome and gene completeness were 119 assessed using BUSCO software (BUSCO, RRID:SCR_015008) v5.2.2 [31] with the core gene 120 set of the eukaryote and metazoan databases, respectively. Second, all the filtered short reads 121 sequenced using the Illumina platform were mapped to the assembled genome by BWA 122 software (BWA, RRID:SCR_010910) v0.7.12-r1039 [32] to evaluate the genome integrity. 123 Third, the transcripts of L. elegans were assembled using Bridger (Bridger, RRID: 124 SCR_017039) (version: r2014-12-01) [33] and then mapped to the assembled genome using 125 BLAT software. To perform the synteny analysis, we implemented the Last software (LAST, 126 RRID:SCR_006119) (v1066) [34] to achieve the whole genome alignment using L. elegans 127 assembly as database, in which the "lastal" command was first used to obtain MAF format 128 alignment files and the "maf-swap" command was then used to sort the alignment and select 129 the best one-to-one blocks. After that, Circos (Circos, RRID:SCR_011798) v0.69-6 [35] was 130 used to plot the syntenic relationship graph. 131

Phylogenetic analysis and divergence time estimation 177
The phylogenetic relationships and divergence time between the 10 test species (L. elegans, A.

Analysis on relative evolution rate 189
The well aligned concatenated protein sequences of single-copy orthologs of the ten species 190 were used to generate relative evolution rate. Two methods were used, i.e., Tajima's relative 191 rate test and two cluster analyses. Tajima's relative rate test was generated by MEGA software 192 (MEGA Software, RRID:SCR_000667) v10 [58]. We specified I. scapularus as the outgroup 193 species and tested the relative evolution rate between L. elegans and other species. The  square test was used to test which species has faster evolution rate compared to the other one. 195 Two cluster analysis was generated by LINTRE software via the tpcv model [59]. We also 196 specified I. scapularus as the outgroup species and tested the relative evolution rate between L. 197 elegans and other species.

Positive selection analysis 199
Single-copy orthologs from the five relatively closely related species of orb-web weaving 200 spiders, i.e., L. elegans, P. tepidariorum, A. bruennichi, T. clavipes and T. antipodiana. As well 201 as their species tree were used to identify the potential positively selected genes (PSGs) and the 202 genes with positively selected sites in L. elegans, using branch model and branch site model in 203 the Codeml tool of the PAML package (v4.8) [56] respectively. Briefly, the rate ratio (ω) of 204 nonsynonymous to synonymous nucleotide substitutions was estimated. One ratio model was 205 used to detect average ω across the species tree (ω0). For each gene, two ratio branch model 206 was used to detect the ω of the appointed branch (L. elegans) to test the (ω1) and ω of all other 207 branches (ω_background). A likelihood ratio test was performed to compare the fit of the two 208 ratio models with the one ratio model to determine whether the gene is positively selected in 209 the appointed branch (ω1>ω0; ω1>ω_background; ω1>1; P-value<0.05). If the gene is not 210 positively selected, then the branch site model was used to detect amino acid sites likely to be 211 is positively selected in the appointed branch using Bayes Empirical Bayes (BEB) analysis [56]. 212

Toxin gene analysis 213
To locate the toxin genes in the genome of both L. elegans and all other related species, we 214 investigated and downloaded toxin sequences from the ArachnoServer database (v3.0) [60] Table S1). The 21-mer analysis showed that the genome size of L. elegans is ~1.74 Gb (Figure  277   S1). Then, 106.80 G filtered Nanopore reads (N50 is 24.01 Kb, ~61.37-fold of the genome) 278 were obtained ( Table S2). The Nanopore filtered reads were assembled into contigs and further 279 assembled into chromosomes using Hi-C reads (77.58 Gb, ~44.57-fold of the genome) ( Table  280 S3). We obtained a 1.57 Gb genome assembly with contig N50 of 4.34Mb and scaffold N50 of 281 114.31 Mb (Table S4). A total of 14 chromosomes were assembled with lengths ranging from 282 70.40 Mb to 133.92 Mb (Figure 1A, Table S5). The number of chromosomes is consistent to 283 previously report to female window spiders [71]. To validate the completeness and accuracy of 284 the L. elegans' genome, assembled transcripts mapping ratio, short reads mapping ratio and 285 Benchmarking Universal Single-Copy Orthologs (BUSCO, v5.2.2) were used in the analysis.
All the assembled transcripts were aligned to the genome, and 77,191 of 85,772 (90.00%) 287 transcripts can be found in the assembled genome (Tables S6-S8). We aligned all the filtered 288 short reads to the assembled genome, and more than 357.88 million reads (99.28%) could be 289 mapped to the genome (Table S9). We also found that 251 of 255 (98.4%) and 930 of 954 290 (97.5%) core eukaryote and metazoan genes were successfully identified in the genome, 291 respectively (Table S10), and this assembly quality was comparable with that of the close-292 related species (Table S11). We also identified the Hox genes in all 10 species (A. geniculata, 293 S. mimosarum, T. clavipes, A. bruennichi, P. tepidariorum, C. sculpturatus, S. dumicola, I. 294 scapularus, T. antipodiana and L. elegans). These results showed that L. elegans has two Hox 295 gene clusters. Both of these two clusters are complete and continuous, which is comparable to 296 other related species (Figure S2) [2,25,44]. These results indicate that the integrity and 297 accuracy of the assembled genome are good. 298

Genome annotation 299
Both tandem repeats and TEs were annotated in the assembled genome, and a total of ~506.09 300 Mb repeat sequences were identified that accounted for 32.30% of the assembled genome 301 (Table S12) (Table S13). For 303 protein-coding genes, 20,167 genes were annotated with 81.03% of the genes having 304 homologous hits in the public databases (Table S14). These genes showed high similarity with 305 related species in gene length distribution, CDS (coding sequence) length distribution, exon 306 length distribution and exon number distribution (Figure S3). The basic genome statistics of 307 this genome, including gene density, tandem repeat, LTR, LINE, SINE, DNA TEs and GC content, are shown in Figure 1B. We checked the synteny block between L. elegans and other 309 species of Arachnida (S. dumicola, P. tepidariorum and T. clavipes, Figure 1C-1E). The results 310 showed that the assembled genome has a good genome synteny relationship with these species. 311

Phylogenetic relationship of L. elegans and other related species 312
To compare the genomics of L. elegans with other spider species, we identified the 313 orthologous/paralogous genes among these species. A total of 28,587 gene families were 314 clustered in these 8 spiders and two outgroup species, and 156 single-copy genes were 315 identified. The phylogenetic relationship of these 10 species was determined using the amino 316 acid and nucleotide acid sequences of CDS, and the fourfold degenerate synonymous site 317 (4dTV) [72] of the single-copy genes was concatenated into a super-gene in each species. Each 318 method showed the same phylogenetic relationship with high bootstrap values (Figures S4-319   S6). This relationship is consistent to the well documented spider tree of life [73]. L. elegans is 320 closely related to P. tepidariorum, both of which are of Theridiids. A calculation of the 321 estimated divergence time suggested that the two species diverged ~73.0 million years ago 322 (Mya) (Figure 2A). 323

TE insertion history of Arachnida 324
We checked the types of TEs and the TE insertion time of all 10 species and found that the TE 325 contents are significantly different. In the Theridiidae, including L. elegans and P. tepidariorum, 326 the TE insertion times are concentrated at 20-35 million years ago. However, the TE insertion 327 times of other species are much older than that of the Theridiidae. Besides, the insertion times 328 of L. elegans (~35 million years) and P. tepidariorum (~20 million years) are much more recent 329 than the divergence between two species, which suggests that the TE insertion event may have happened after their divergence ( Figure 2B). 331

Relative evolution rate of species 332
Species in different environments may face different selection pressures, and the relative rate 333 of evolution can reflect this. The relative evolution rate results showed that L. elegans had the 334 fastest evolution rate among these species and suggested that it has experienced strong selection 335 pressure. A. geniculata had the slowest evolution rate (Figure S7; Tables S17-S18). It is the 336 only species from the relatively ancient Mygalomorphae. The result suggests that relatively 337 ancient spider group may have relatively less selection pressure in their habitats. However, due 338 to its low genome quality, this result is pending further verification. 339

Positively selected genes 340
Using five relatively closely related species of orb-web weaving spiders belonging to the 341 Araneoidea, i.e., L. elegans, P. tepidariorum, A. bruennichi, T. clavipes and T. antipodiana, we 342 identified eight positively selected genes and 348 genes with positively selected sites in L. 343 elegans (Table S17). In these genes, lhx9 was the only gene related to gonadal development. 344 The structures of this gene in these species were constructed, and L. elegans was the most 345 unique of all species (Figure S8). These results indicated that the gonadal development of L. 346 elegans may differ from that of the other species. The CaMKI gene belongs to 347 calcium/calmodulin-dependent protein kinase family, and the other gene, CaMKII, is associated 348 with OA signaling, which may affect Ca 2+ signaling or adjust intracellular cAMP levels in vivo. 349 In the spider Cupiennius salei, CaMKII may also be a downstream modulator of OA signaling 350 in spider VS-3 neurons [74], which is related to cell excitability. 351

Venom gene analysis 352
The venoms of Latrodectus spp. are famous for their potency and ability to cause extreme and 353 long-lasting pain. Based on genome-wide comparative analysis, we identified members of the 354 four major venom gene families: CRISP family, ICK family, latrodectin family and latrotoxin 355 family. The severe symptoms of Latrodectus envenomation are largely attributed to latrotoxins 356 [75]. Consistently, we found that, compared to other spiders, the number of genes in the 357 latrotoxin family of L. elegans and the house spider P. tepidariorum has undergone lineage-358 specific expansion. The number of latrotoxin genes found in P. tepidariorum and L. elegans 359 was 30 and 50, respectively, which are more than that found in other spiders (1-12) (Figure  360 3A). We also identified six toxins from the CRISP family and two from the ICK family in L. 361 elegans respectively. The latrotoxin and CRISP families are both ancient and relatively 362 conserved toxins which exist in all the test species of Arachnida. Toxins from the ICK family 363 seem unique to the Araneoidea (Figure 3A). Genic loci of all these venom toxins were mapped 364 on 11 chromosome scaffolds (Chromosomes 2, 3, 5, 7, 8, 9, 10, 11, 12, 13 and scaffold 39). The 365 majority of latrotoxin genes were located on Chromosome 11, and there was a remarkable 366 tandem duplication of latrotoxin genes on Chromosome 11 ( Figure 3B). Phylogenetic analysis 367 showed that the latrotoxins experienced substantial gene duplication and diversification in the 368 two Theridiidae spiders, including L. elegans and P. tepidariorum, and that latrotoxins of L. 369 elegans in the clade of latest expansion were mostly located on Chromosome 11 ( Figure 3C). 370 We analyzed the ratio of nonsynonymous to synonymous nucleotide substitution rate (Ka/Ks 371 ratio) of each latrotoxin gene pair and found that L. elegans latrotoxin genes had generally       Table S8. The statistics of the transcripts mapping ratio on the assembled genome. 420 Table S9. The statistics of the short reads mapping ratio on the assembled genome. 421 Table S10. The quality evaluation of assembled genome by BUSCO software. 422 Table S11. Comparison of the related genomes with our chromosome-level genome. 423 Table S12. The statistics of the annotated repeat sequences in our assembled genome. 424 Table S13. The statistics of the annotated repeat sequences in our assembled genome by de 425 novo prediction. 426 Table S14. The functional annotation of the predicted protein-coding genes. 427 Table S15. Relative evolution rate among these species by LINTRE software. 428 Table S16. Relative evolution rate among these species by MEGA software.

Competing interests 443
The authors declare that they have no competing interests.

Figure2
Click here to access/download; Figure; L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s L . e l e g a n s P . t e p i d a r i o r u m L . e l e g a n s L . e l e g a n s L .e l e g a n s L .e le g a n s L .e le g a n s L .e le ga ns P.t ep ida rio ru m P.tep ida rioru m P.te pida rior um